Soft Cardinality: A Parameterized Similarity Function for Text Comparison

نویسندگان

  • Sergio Jiménez
  • Claudia Jeanneth Becerra
  • Alexander F. Gelbukh
چکیده

We present an approach for the construction of text similarity functions using a parameterized resemblance coefficient in combination with a softened cardinality function called soft cardinality. Our approach provides a consistent and recursive model, varying levels of granularity from sentences to characters. Therefore, our model was used to compare sentences divided into words, and in turn, words divided into q-grams of characters. Experimentally, we observed that a performance correlation function in a space defined by all parameters was relatively smooth and had a single maximum achievable by “hill climbing.” Our approach used only surface text information, a stop-word remover, and a stemmer to tackle the semantic text similarity task 6 at SEMEVAL 2012. The proposed method ranked 3rd (average), 5th (normalized correlation), and 15th (aggregated correlation) among 89 systems submitted by 31 teams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Comparison Using Soft Cardinality

The classical set theory provides a method for comparing objects using cardinality and intersection, in combination with well-known resemblance coefficients such as Dice, Jaccard, and cosine. However, set operations are intrinsically crisp: they do not take into account similarities between elements. We propose a new general-purpose method for comparison of objects using a soft cardinality func...

متن کامل

SC Spectra: A Linear-Time Soft Cardinality Approximation for Text Comparison

Soft cardinality (SC) is a softened version of the classical cardinality of set theory. However, given its prohibitive cost of computing (exponential order), an approximation that is quadratic in the number of terms in the text has been proposed in the past. SC Spectra is a new method of approximation in linear time for text strings, which divides text strings into consecutive substrings (i.e.,...

متن کامل

Soft Cardinality + ML: Learning Adaptive Similarity Functions for Cross-lingual Textual Entailment

This paper presents a novel approach for building adaptive similarity functions based on cardinality using machine learning. Unlike current approaches that build feature sets using similarity scores, we have developed these feature sets with the cardinalities of the commonalities and differences between pairs of objects being compared. This approach allows the machine-learning algorithm to obta...

متن کامل

Soft Cardinality in Semantic Text Processing: Experience of the SemEval International Competitions

Soft cardinality is a generalization of the classic set cardinality (i.e., the number of elements in a set), which exploits similarities between elements to provide a “soft” counting of the number of elements in a collection. This model is so general that can be used interchangeability as cardinality function in resemblance coefficients such as Jaccard’s, Dice’s, cosine and others. Beyond that,...

متن کامل

UNAL-NLP: Combining Soft Cardinality Features for Semantic Textual Similarity, Relatedness and Entailment

This paper describes our participation in the SemEval-2014 tasks 1, 3 and 10. We used an uniform approach for addressing all the tasks using the soft cardinality for extracting features from text pairs, and machine learning for predicting the gold standards. Our submitted systems ranked among the top systems in all the task and sub-tasks in which we participated. These results confirm the resul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012